Good Word Attacks on Statistical Spam Filters
نویسندگان
چکیده
Unsolicited commercial email is a significant problem for users and providers of email services. While statistical spam filters have proven useful, senders of spam are learning to bypass these filters by systematically modifying their email messages. In a good word attack, one of the most common techniques, a spammer modifies a spam message by inserting or appending words indicative of legitimate email. In this paper, we describe and evaluate the effectiveness of active and passive good word attacks against two types of statistical spam filters: naive Bayes and maximum entropy filters. We find that in passive attacks without any filter feedback, an attacker can get 50% of currently blocked spam past either filter by adding 150 words or fewer. In active attacks allowing test queries to the target filter, 30 words will get half of blocked spam past either filter.
منابع مشابه
A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters
Statistical spam filters are known to be vulnerable to adversarial attacks. One of the more common adversarial attacks, known as the good word attack, thwarts spam filters by appending to spam messages sets of “good” words, which are words that are common in legitimate email but rare in spam. We present a counterattack strategy that attempts to differentiate spam from legitimate email in the in...
متن کاملCamouflages and Token Manipulations-The Changing Faces of the Nigerian Fraudulent 419 Spammers
The inefficiencies of current spam filters against fraudulent (419) mails is not unrelated to the use by spammers of good-word attacks, topic drifts, parasitic spamming, wrong categorization and recategorization of electronic mails by e-mail clients and of course the fuzzy factors of greed and gullibility on the part of the recipients who responds to fraudulent spam mail offers. In this paper, ...
متن کاملEffectiveness and Limitations of Statistical Spam Filters
Spam is not only clogging the Internet traffic by consuming a hefty amount of network bandwidth but it is also a source for e-mail born viruses, spyware, adware and Trojan Horses. It is also used to carry out denial of service, directory harvesting and phishing attacks that directly cause financial losses. Further, the contents of spam are often offensive and contain adult oriented and fraudule...
متن کاملDenial of Information Attacks in Event Processing
Automated Denial of Information Attacks. It is a common assumption in event processing that the events are “clean”, i.e., they come from well-behaved and trustworthy sources. This assumption does not hold in all major open communications media for several reasons. First, adversaries may spread massive noise data, e.g., in email spam. Second, adversaries may inject potentially interesting, but o...
متن کاملReversing the effects of tokenisation attacks against content-based spam filters
Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. More than 85% of received e-mails are spam. Historical approaches to combating these messages, including simple techniques like sender blacklisting or the use of e-mail signatures, are no longer completely reliable. Many current solutions feature machine-learning ...
متن کامل